{
 "cells": [
  {
   "cell_type": "raw",
   "id": "df29b30a-fd27-4e08-8269-870df5631f9e",
   "metadata": {},
   "source": [
    "---\n",
    "title: Quickstart\n",
    "sidebar_position: 0\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d28530a6-ddfd-49c0-85dc-b723551f6614",
   "metadata": {},
   "source": [
    "In this quick start, we will use LLMs that are capable of **function/tool calling** to extract information from text.\n",
    "\n",
    ":::{.callout-important}\n",
    "Extraction using **function/tool calling** only works with [models that support **function/tool calling**](/docs/modules/model_io/chat/function_calling).\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4412def2-38e3-4bd0-bbf0-fb09ff9e5985",
   "metadata": {},
   "source": [
    "## Set up\n",
    "\n",
    "We will use the new [withStructuredOutput()](/docs/integrations/chat/) method available on LLMs that are capable of **function/tool calling**, along with the popular and intuitive [Zod](https://zod.dev/) typing library.\n",
    "\n",
    "Select a model, install the dependencies for it and set your API keys as environment variables. We'll use Mistral as an example below:\n",
    "\n",
    "```{=mdx}\n",
    "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
    "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
    "\n",
    "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
    "\n",
    "<Npm2Yarn>\n",
    "  @langchain/mistralai zod\n",
    "</Npm2Yarn>\n",
    "```\n",
    "\n",
    "You can also see [this page](/docs/integrations/chat/) for an overview of which models support different kinds of structured output."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54d6b970-2ea3-4192-951e-21237212b359",
   "metadata": {},
   "source": [
    "## The Schema\n",
    "\n",
    "First, we need to describe what information we want to extract from the text.\n",
    "\n",
    "For convenience, we'll use Zod to define an example schema to extract personal information. You may also use JSON schema directly if you wish."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c141084c-fb94-4093-8d6a-81175d688e40",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { z } from \"zod\";\n",
    "\n",
    "// Note that:\n",
    "// 1. Each field is `optional` -- this allows the model to decline to extract it!\n",
    "// 2. Each field uses the `.describe()` method -- this description is used by the LLM.\n",
    "// Having a good description can help improve extraction results.\n",
    "const personSchema = z.object({\n",
    "  name: z.optional(z.string()).describe(\"The name of the person\"),\n",
    "  hair_color: z.optional(z.string()).describe(\"The color of the person's hair, if known\"),\n",
    "  height_in_meters: z.optional(z.string()).describe(\"Height measured in meters\")\n",
    "}).describe(\"Information about a person.\");"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f248dd54-e36d-435a-b154-394ab4ed6792",
   "metadata": {},
   "source": [
    "There are two best practices when defining schema:\n",
    "\n",
    "1. Document the **attributes** and the **schema** itself: This information is sent to the LLM and is used to improve the quality of information extraction.\n",
    "2. Do not force the LLM to make up information! Above we used `Optional` for the attributes allowing the LLM to output `None` if it doesn't know the answer.\n",
    "\n",
    ":::{.callout-important}\n",
    "For best performance, document the schema well and make sure the model isn't force to return results if there's no information to be extracted in the text.\n",
    ":::\n",
    "\n",
    "## The Extractor\n",
    "\n",
    "Let's create an information extractor using the schema we defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a5e490f6-35ad-455e-8ae4-2bae021583ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { ChatPromptTemplate, MessagesPlaceholder } from \"@langchain/core/prompts\"\n",
    "\n",
    "// Define a custom prompt to provide instructions and any additional context.\n",
    "// 1) You can add examples into the prompt template to improve extraction quality\n",
    "// 2) Introduce additional parameters to take context into account (e.g., include metadata\n",
    "//    about the document from which the text was extracted.)\n",
    "\n",
    "const SYSTEM_PROMPT_TEMPLATE = `You are an expert extraction algorithm.\n",
    "Only extract relevant information from the text.\n",
    "If you do not know the value of an attribute asked to extract, you may omit the attribute's value.`;\n",
    "\n",
    "const prompt = ChatPromptTemplate.fromMessages([\n",
    "  [\"system\", SYSTEM_PROMPT_TEMPLATE],\n",
    "  // Please see the how-to about improving performance with\n",
    "  // reference examples.\n",
    "  // new MessagesPlaceholder(\"examples\"),\n",
    "  [\"human\", \"{text}\"]\n",
    "]);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "832bf6a1-8e0c-4b6a-aa37-12fe9c42a6d9",
   "metadata": {},
   "source": [
    "We need to use a model that supports function/tool calling.\n",
    "\n",
    "Please review [the chat model integration page](/docs/integrations/chat/) for list of some models that can be used with this API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "04d846a6-d5cb-4009-ac19-61e3aac0177e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { ChatMistralAI } from \"@langchain/mistralai\";\n",
    "\n",
    "const llm = new ChatMistralAI({\n",
    "  modelName: \"mistral-large-latest\",\n",
    "  temperature: 0,\n",
    "});\n",
    "\n",
    "const extractionRunnable = prompt.pipe(llm.withStructuredOutput(personSchema));"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23582c0b-00ed-403f-a10e-3aeabf921f12",
   "metadata": {},
   "source": [
    "Let's test it out!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "13165ac8-a1dc-44ce-a6ed-f52b577473e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ name: \u001b[32m\"Alan Smith\"\u001b[39m, height_in_meters: \u001b[32m\"1.8288\"\u001b[39m, hair_color: \u001b[32m\"blond\"\u001b[39m }"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const text = \"Alan Smith is 6 feet tall and has blond hair.\"\n",
    "\n",
    "await extractionRunnable.invoke({ text });"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd1c493d-f9dc-4236-8da9-50f6919f5710",
   "metadata": {},
   "source": [
    ":::{.callout-important} \n",
    "\n",
    "Extraction is Generative 🤯\n",
    "\n",
    "LLMs are generative models, so they can do some pretty cool things like correctly extract the height of the person in meters\n",
    "even though it was provided in feet!\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28c5ef0c-b8d1-4e12-bd0e-e2528de87fcc",
   "metadata": {},
   "source": [
    "## Multiple Entities\n",
    "\n",
    "In **most cases**, you should be extracting a list of entities rather than a single entity.\n",
    "\n",
    "This can be easily achieved with Zod by nesting models inside one another. Here's an example using the `personSchema` we defined above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "591a0c16-7a17-4883-91ee-0d6d2fdb265c",
   "metadata": {},
   "outputs": [],
   "source": [
    "const dataSchema = z.object({\n",
    "  people: z.array(personSchema)\n",
    "});"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f5cda33-fd7b-481e-956a-703f45e40e1d",
   "metadata": {},
   "source": [
    ":::{.callout-important}\n",
    "Extraction might not be perfect here. Please continue to see how to use **Reference Examples** to improve the quality of extraction, and see the **guidelines** section!\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cf7062cc-1d1d-4a37-9122-509d1b87f0a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  people: [\n",
       "    { name: \u001b[32m\"Jeff\"\u001b[39m, hair_color: \u001b[32m\"black\"\u001b[39m, height_in_meters: \u001b[32m\"1.8288\"\u001b[39m },\n",
       "    { name: \u001b[32m\"Anna\"\u001b[39m, hair_color: \u001b[32m\"black\"\u001b[39m }\n",
       "  ]\n",
       "}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const extractionRunnable = prompt.pipe(llm.withStructuredOutput(dataSchema));\n",
    "const text = \"My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me.\";\n",
    "await extractionRunnable.invoke({ text });"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fba1d770-bf4d-4de4-9e4f-7384872ef0dc",
   "metadata": {},
   "source": [
    ":::{.callout-tip}\n",
    "When the schema accommodates the extraction of **multiple entities**, it also allows the model to extract **no entities** if no relevant information\n",
    "is in the text by providing an empty list. \n",
    "\n",
    "This is usually a **good** thing! It allows specifying **required** attributes on an entity without necessarily forcing the model to detect this entity.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f07a7455-7de6-4a6f-9772-0477ef65e3dc",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "Now that you understand the basics of extraction with LangChain, you're ready to proceed to the rest of the how-to guide:\n",
    "\n",
    "- [Add Examples](/docs/use_cases/extraction/how_to/examples): Learn how to use **reference examples** to improve performance.\n",
    "- [Handle Long Text](/docs/use_cases/extraction/how_to/handle_long_text): What should you do if the text does not fit into the context window of the LLM?\n",
    "- [Handle Files](/docs/use_cases/extraction/how_to/handle_files): Examples of using LangChain document loaders and parsers to extract from files like PDFs.\n",
    "- [Without function calling](/docs/use_cases/extraction/how_to/parse): Use a prompt based approach to extract with models that do not support **tool/function calling**.\n",
    "- [Guidelines](/docs/use_cases/extraction/guidelines): Guidelines for getting good performance on extraction tasks."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Deno",
   "language": "typescript",
   "name": "deno"
  },
  "language_info": {
   "file_extension": ".ts",
   "mimetype": "text/x.typescript",
   "name": "typescript",
   "nb_converter": "script",
   "pygments_lexer": "typescript",
   "version": "5.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
